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Abstract. The main object of this paper is to present some general concepts of Bayesian 
inference and more specifically the estimation of the hyperparameters in inverse problems. 
We consider a general linear situation where we are given some data y related to the 
unknown parameters x by y — Ax + n and where we can assign the probability laws 
p(x\0), p(y\x,/3), p(/3) and p{6). The main discussion is then how to infer x, 9 and /3 
either individually or any combinations of them. Different situations are considered and 
discussed. As an important example, we consider the case where 6 and (3 are the precision 
parameters of the Gaussian laws to whom we assign Gamma priors and we propose some 
new and practical algorithms to estimate them simultaneously. Comparisons and links 
with other classical methods such as maximum likelihood are presented. 
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1. Introduction 

In a general Bayesian inference, we have the data y, a known relation between 
the unknown parameters x and y and finally the hyperparameters f3 and 6. The 
Bayesian estimation technique is now well established % §, |, 1 1, § § and has 
been used since many years to resolve the inverse problems in signal and image 
reconstruction and restoration 0,[l|,[l|,[3H|,[7[^||o[|l|- 



The first step before applying the Bayes' rule is to assign the prior probability 
laws p(x\6), p(y\x, f3), p(9) and p(f3). The next step is to determine the posterior 
laws and then to infer the unknowns. In this paper we are focusing more on the 
second step than on the first step. So we assume that all the direct probability 
laws are known. 

The main object of this paper is to show how can we infer simultaneously the 
unknown parameters x and the hyperparameters (3 and 6 from the data y. 

Before going more in details let us give one example. This will permit us to 
fix the situations. Consider the case where the unknown parameters x represent 
the pixel values of an unobserved image and the data y are the pixel values of an 
observed image which is assumed to be a degraded version of it. If we consider a 
linear degradation we have 

y = Ax + n, (1) 



where A is a (m x n) matrix representing the degradation process and n represents 
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the measurement uncertainty (noise) which is assumed to be additive, centered, 
white, Gaussian and independent of x. This hypothesis leads us to 

p ( y \ X '® = Z~(j3j ex p|-^(y - Ax)\v - Axyt ■ (2) 

In this case (3 is a positive parameter which is related to the noise variance a\ by 
(3 = l/of and Zi(/3) = (2/3/7r)" 1 / 2 is the normalizing factor. 
Consider also, for this example, a Gaussian prior law for x : 

p(x\8) = ^—exp{~9cl>(x)\ with <j>(x) = x'P^x, (3) 

where 9 = \ j<j\ is a positive parameter, Pq is the a priori covariance matrix of x 
and Z 2 (6) = (29/TT) n / 2 \P \ 1/2 . 

A well known case is the situation where 9, (3 and Pq are known and we only 
want to estimate x. In fact, in this special case, the joint law p{y 1 x\0,{3) and the 
posterior law p(x\y, 6, (3) are both Gaussian and we have 



p(x\y, 6,0} cx exp - Ax)\y - Ax) - ^9x t P x a;| , 



(4) 



and, if we note by 



x = argmin{j(a;) = (y — AxY(y — Ax) — Xx t P 1 x] with \ = 9/(3, (5) 
then, it is easy to show that 

x\y ~ Af (x, P) with J ~ ~ ^ P , AV t n -i (6) 

One can make a comparison with the classical regularization techniques for inverse 
problems with smoothness hypothesis, where Pq 1 = D l D with D a matrix ap- 
proximating a differentiation operator and A is called the regularization parameter 

0- 

What we address here is the generalization of the problem of the determination 
of the regularization parameter A which has been studied for a long time [ p2| , p3| , 
H, H || [27| H f0| |l| and is still an open problem. 

What is proposed here is to consider the general case where 6 and (3 are 
considered to be unknown and we are facing to make inference as well about x 
as about them. What we propose is to consider the hyperparameters 9 and /3 in 
the same manner than x, i.e; translate our prior knowledge about them by the 
probability laws p(0) and p{(3), then determine the posterior laws and finally infer 
about them from these posterior laws. 
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2. General Bayesian inference approach 

Assume now that we know the expressions of all the prior laws. We can then 
calculate the joint probability law: 

p(y, x, 8, p) = p(y\x, (3) p(x\8) p(0) p(/3). (7) 

In an ideal case where we are given A, y, (3 and 8, to infer x we can calculate 
the posterior law p(x\y, 8, (3) and if we choose as the solution to our problem the 
Maximum a posteriori (MAP) estimate, we have: 

x = argmax{p(a;|y,0,/3)} = argmax {p(y\x, (3) p(x\8)} . (8) 

But, unfortunately in practical situations we are not given (3 and 8 and the main 
problem is how to infer them. We consider the following situations: 

1. The first is to estimate the three quantities simultaneously. We call this 
method Joint Maximum a posteriori (JMAP) and the estimates are defined 
as 

(£,0,3) = arg max ■ {p(y\x,f3)p(x\8)p(/3)p(8)}. (9) 
(x,8,/3) 

One practical way to do this joint optimization is to use the following algo- 
rithm 

x k+1 = argmax ^p(y\x,(3 )p(x\8 )| 
8 = argmax jp(2 fe 1 0)p(0)j ( 10 ) 

(3 = arg max {p(y\x k ,(3) P (f3)} 

2. In the second case 8 and (3 are considered as the nuisance parameters and are 
integrated out of the problem and x is estimated by 



x = argmax {p(cc|y)} = argmax p(y, x, 8, (3) A8 &f3^ 

= Mgmax{Jp(y\x,f3)p((3)d(3jp(x\d)p(8)d8y (11) 

We call this method Marginalized MAP type one (MMAP 1 ). 
3. In the third case only 8 is considered as the nuisance parameter and is inte- 
grated out of the problem and x and (3 are estimated by 

(x,3) = arg max {p{x,(3\y, 8)} = arg max <^ / p(y,x, 8,(3)A8 \ 
(x,(3) (x,(3) Li J 

= arg max \p{y\x, (3) p{(3) f p(x\8) p(8) ddX . (12) 
(x,f3) I J J 

We call this method Marginalized MAP type two (MMAP 2 ). 
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4. Finally, in the last case we may first estimate 9 and (3 by 

; max {p(9, (3\y)} = arg max 
(0,(3) " (6,13) 



(0,3) = arg max {p(9,(3\y)} = arg max \ p(y,x,d, (3) dx \ 
(9,(3) (0,(3) [J J 

= arg max i p((3)p(9) I p(y\x,(3)p(x\9)dx 
(9,(3) I J J 



= arg max {p((3) P (0)l(O, (3\y)} . (13) 
(0,(3) 

and then used them for the estimation of x by 

x — argmax |p(a;|y, 9, /3)j . (14) 

We call this method Marginalized MAP type three (MMAP 3 ). 

Note that if p(9) and p((3) are uniform functions of 9 and (3, then 9 and 

(3 correspond to the classical maximum likelihood (ML) estimates because 

1(9, (3\y) is, for a given y, the likelihood function of 9 and (3. 

The calculus of 1(9, (3\y) is not easy and so is its optimization. Many works 

have been done on the subject. We distinguish three kind of methods: 

- The first is to use the Expectation-Maximization (EM) algorithm which has 
been developed exactly in the context of ML parameter estimation |3^, [|, |3j| . 

- The second is to estimate the integral using a Monte Carlo simulation 
method (Stochastic EM: SEM). 

- The third is to make some approximations. For example, at each iteration 
during the optimization, one may obtain an analytical expression for that in- 
tegral by approximating the expression inside it by a second order polynomial 
(Gaussian quadrature approximation). 

We will consider this last method. 



3. A case study 

Let us consider the following simple linear inverse problem y = Ax + n and make 
the following hypothesis: 

— The noise n is considered to be white, centered and Gaussian with precision 
(3, so that we have 

y\x,P^N(Ax,(3- 1 I) -^p(y\x,(3) = ^— exp (-±(3\\y - Ax\\ 2 

(15) 

where Z-y((3) cx (3 m l 2 . 

— Our prior prior knowledge about x can be translated by 

p(x\d) = ^exp{-~0<p(x)X. (16) 
where we will consider the following special cases for <j)(x): 
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• Gaussian priors: 

<j> G (x) = x'P^x = \\Dx\\ 2 — ► x\9 ~ Af(0, O^P^ 1 ), 

which can also be written (/>g(x) = J2jJ2i Pij x iXj with some special 
cases: _ 

4>g(x) = x % or <t>G(x) = \xj - 2^-1 1 2 - 

3 j 

• Generalized Gaussian priors: 

<Pgg(x) = Y \ x o - x i-A P , 1<P<2. 

3 

• Entropic priors: 

n 

<Pe(x) = ^ S(xj) where S(xj) = {x 2 , Xj Inxj — Xj, Inxj — Xj\ . 

3=1 

• Markovian priors: 

</>m{x) = V(xj,Xj), where V(xj, Xi) is a potential function 

3 ieNj 

and where Nj is a set of sites considered to be neighbors of site j, for 
example Nj = {j - 1, j + 1}, or Nj = {j - 2, j - 1, j + 1, j + 2}. 

Note that, in all cases 9 is generally a positive parameter. Note also that in 
the first case we have Z 2 (6) cx 6 n / 2 . Unfortunately we have not an analytic 
expression for Z 2 (0) in the other cases. However, in the situations we are 
concerned with, Z 2 (9) can either be calculated numerically or approximated 

by 

Z 2 (9) cx 9 an ' 2 . (17) 
— 9 and (3 are both positive parameters. We choose Gamma prior laws for them: 

9 ~ G(a, () — ► p{9) cx fl^-^exp {-(9} — ► E {9} = a/(, Var {9} = a/( 2 

(3 ~ Q(b, — > p((3) cx /^"^exp {-C/3} — * E {/?} = 6/C, Var {/?} - 6/C 2 
Now, using the following notations 

Q{x) = \\y- Ax\\ 2 , J (x) = l3Q{x) + 9<j>{x\ 

VQ{x) = -2A\y - Ax), and VJ (x) = f3VQ(x) + 0V0(sc), 

we can calculate the expression of the joint pdf p(y,x,9, (3) = p(y \x,(3) p(x \ 9) p{9) p((3), 
which can be written 



p{y, X, 9, 13) CX fl-Wa-a+l)^- (m/2-6+1) 
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with J x (x) = 0{Q(x) + 2C]+9{^(x) + 2(\ = -h(x) + 2Q(9 + 0) . (19) 

This will let us to go further in details of some of the above mentioned cases. For 
example in the Gaussian case we have: 

x\y, 6,0 ~ M(x, P) with x = (0 A 1 A + 9P^ 1 )~ 1 A l y and P = (0A l A + 9P^ 1 )~ 1 

i 2(2 — - OLft 

9\y, x,0~ G(a - an/2, -[^(x) + 2£]) — ► E {6\y, x, 0} 



2^ ' IJ " " s [<f>{x) + 2(Y 

j 2b — m 

0\y, x, 9~G(b- m/2, - [Q(x) + 2(]) — ► E {0\y, x, 8} = 

2 lQ(x) + 2Q 

Now, let us consider the four aforementioned methods a little more in details. 

3.1. Joint Maximum A Posteriori (JMAP) 

Using the expressions and the notations of the last paragraph in (11) we have 
to deal with the following algorithm: 

x k+1 = eagma X {p(y\x,0 k )p(x\P)} =argmm{j o (a!,3*,0 fc )}, 

6 k+1 = arg max jp(£ fc | (9) p(6»)j = argmin j[0(x fe ) + 2C]6> - (2a - cm - 2) In 6»j , 

k+1 = argmax|p(y|S fe ,/3)p(/3)| = argmm j[Q(£ fe ) + 2C]/3 - (26 - to - 2) In/?} . 

The two last equations have explicit solutions. In the case of Gaussian priors, the 
first equation has also an explicit solution. However, in general, we propose the 
following gradient based algorithm: 

Algorithm 1: x k+1 = (1 - fi)x k - fiV J (x k , k , 0*) 

= (1 -fi)x k -/i[0 k VQ(x k ) +6 k V4>{x k )}, < /* < 1, 

— t, , i (2a - an — 2) 

+ = - 1 -, a> (an + 2 /2, 

[<f>(x k ) + 2Q 

= ^ 7 6> m + 2 /2. 

[0(2") + 2C] 

The conditions a > (an + 2)/2 and 6 > (m + 2)/2 are added to satisfy, when 
necessary, the positivity constraint of 6 and /3. 

3.2. Marginalized Maximum A Posteriori MMAP 1 

Considering 6 and as the nuisance parameters and integrating out them from 
p(y, x, 0, 0) we obtain 

p(y,x) = JJ p(y,x,6,0)d0d9 cx [Q(x)+2(\^ m - 2 ^' 2 &(x)+2C}- (an - 2a)/2 (20) 

Now, defining x MMAP = argmax{p(a;|y)} = argmin j -J 2 (x)\, 

with J 2 (x) = (2a-an)\n[Q(x) + 2Q] + (2b-m)\n[(P(x) + 2Q], (21) 
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and trying to calculate this solution by an iterative gradient based algorithm, we 
have to calculate 

We propose then the following iterative algorithm: 
Algorithm 2: x k+1 = (1 - f i)x k - aV J 2 (x k ) 

= (1 - n)x k - ^[0 k VQ(x k ) + 6 k Vcp(x k )], < a* < 1, 

2* _ ( 2a - an ) 



mx k ) + 2(] 



a > an/2, 



ok _ m ) . m 

= t — , b>m/2. 

[Q(x k )+2Q 

3.3. Marginalized Maximum A Posteriori MMAP 2 

In this case, 9 only is considered as a nuisance parameter and is integrated out: 

p(y,x,0) = J p(y, x, 6, 0) d6 

oc f3- m/2+b ^[Hx) + 2Cr (a "- 2a)/2 cxp j-l/3[Q(x) + 2(] J .(22) 

Then, x and (3 are estimated by 

(x,0) = argmax{p(y,x,/3)} . (23) 
x,p 

Noting 

-2 lnp(y, x, 0) = -(26 - m - 2) ln/3 + (2a - an) ln[<£(x) + 2(] + /3[Q(x) + 2(] 
and differentiating it with respect to (3 gives 

- _ 26 - m - 2 
^~ [QN + 2C]" 

So, noting 

J 3 (x, 0) = (2a - an) ln[0(x) + 2(] + /3[Q(x) + 2C] (24) 

and V J 3 (x, /3) - ^—^ 4>(x) + 0VQ(x), 

and using a gradient based algorithm for minimizing J 3 with respect to x we 
propose the following: 

Algorithm 3: x k+1 = (1 - fi)x k+1 - f3 k VQ(x) - 0*V0(se), < u < 1, 

2a — an , „. ,_, 

^ = TTTZfc — , a > (an + 2)/2 



x 



; ) + 2C]' 



^ = 26m 2 5>(m + 2)/2 _ 

[Q(S fe ) + 2C] 
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3.4. Maximum Likelihood or MMAP 3 

In this case first x integrated out x from p(y, x, 9, (3) to obtain: 

P(y,0,(3) = J p{y,x,0,p)dx= h J exp|--J 1 (x,/3,^)| da; (25) 

with Ji(x,/9,fl)= J 9[Q(x) + 2C]+^(x) + 2C]. (26) 

Excepted the Gaussian case where J\ is a quadratic function of a;, in general, it is 
not easy to obtain an analytical expression for this integral. One can then try to 
make a Gaussian approximation which means to develop J\ around its minimum 
2 MA p = argmin^. { Ji(x, f3, 6)} by 

Ji(x,0,9) ~ ^(x - x MAP yM(x - Xmap) +g t (x - x MAP ) + c, (27) 

where g — (3VQ(x) + 9\7(f>(x) is the gradient of J\ and M is its Hessian, both 
calculated for x MAP . With this approximation we obtain 



Differentiating 1(6, (3\y) = \np(y,9, (3) with respect to (3 and 9 gives 
2b - m - 2 ~ 2a - an - 2 



(28) 



[Q(£ fe ) + 2C] + trace[M _1 A* A] ' \(j)(x k ) + 2(] + trace[M _1 P;r ' 

where Pq 1 is the Hessian of 0(je), A t A is the Hessian of Q(x) and M is the 
Hessian of Ji(a;): 

M(f3,9) = (3A t A + 9P Q 1 . 
Using these expressions we propose the following algorithm: 

Algorithm 4: x k = argmin { ,h{x, (3 k , 9 k ) } = M(f3 k , A'y, 



nk+l 



X 

2a — an — 2 



[(j){x k ) + 2C] + tracc[M _1 Po x ] ' 

£fc+i = 2b -m- 2 

\Q(x k ) + 2C] + tracefM" 1 A* A] ' 
This algorithm needs the inversion of the matrix M which is very costly in practice. 



4. Comparison and the main structure of the proposed algorithmes 

Comparing the Algorithms 1 to 4, one can see that they all have the same 
structure: 

— for fixed 9 and (3 optimize locally a criterion J(x, (3, 9), and 

— update 9 and (3 using the solution x just obtained and iterate until conver- 
gence. 

Note also that only in Algorithm 4, the updating step takes account of the 
measurement system operator A and the covariance structure P of the input x. 
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5. Conclusions and perspectives 

We considered the inverse problem of infcring the unknowns x from the data y in 
a special case of linear inverse problems y = Ax + n using a full bayesian approach 
and presented four algorithms to estimate simultanously the hyperparameters 9 
and [3 and the unknowns x. The main structure of all of these algorithms are 
the same even if the procedure to deduce them have been different. However, 
we have not yet really tested them to give any conclusion about their relative 
performances. Note however that one of them distinguishes itself from the others 
by taking account of the measurement system operator A and the covariance 
structure Po of x in the hyperparameters updating step and, by the same way, 
by its calculation cost. We hope to be able to give some measure of their relative 
performances in simulation and in real applications in near future. 
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